Pesquisa | Portal Regional da BVS

RawHash: enabling fast and accurate real-time analysis of raw nanopore signals for large genomes.

Firtina, Can; Mansouri Ghiasi, Nika; Lindegger, Joel; Singh, Gagandeep; Cavlak, Meryem Banu; Mao, Haiyu; Mutlu, Onur.

Bioinformatics ; 39(39 Suppl 1): i297-i307, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387139

RESUMO

Nanopore sequencers generate electrical raw signals in real-time while sequencing long genomic strands. These raw signals can be analyzed as they are generated, providing an opportunity for real-time genome analysis. An important feature of nanopore sequencing, Read Until, can eject strands from sequencers without fully sequencing them, which provides opportunities to computationally reduce the sequencing time and cost. However, existing works utilizing Read Until either (i) require powerful computational resources that may not be available for portable sequencers or (ii) lack scalability for large genomes, rendering them inaccurate or ineffective. We propose RawHash, the first mechanism that can accurately and efficiently perform real-time analysis of nanopore raw signals for large genomes using a hash-based similarity search. To enable this, RawHash ensures the signals corresponding to the same DNA content lead to the same hash value, regardless of the slight variations in these signals. RawHash achieves an accurate hash-based similarity search via an effective quantization of the raw signals such that signals corresponding to the same DNA content have the same quantized value and, subsequently, the same hash value. We evaluate RawHash on three applications: (i) read mapping, (ii) relative abundance estimation, and (iii) contamination analysis. Our evaluations show that RawHash is the only tool that can provide high accuracy and high throughput for analyzing large genomes in real-time. When compared to the state-of-the-art techniques, UNCALLED and Sigmap, RawHash provides (i) 25.8× and 3.4× better average throughput and (ii) significantly better accuracy for large genomes, respectively. Source code is available at https://github.com/CMU-SAFARI/RawHash.

Assuntos

Sequenciamento por Nanoporos , Nanoporos , Genômica , Ploidias , DNA

Scrooge: a fast and memory-frugal genomic sequence aligner for CPUs, GPUs, and ASICs.

Lindegger, Joël; Senol Cali, Damla; Alser, Mohammed; Gómez-Luna, Juan; Ghiasi, Nika Mansouri; Mutlu, Onur.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-36961334

RESUMO

MOTIVATION: Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this step requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above is the recently proposed GenASM algorithm. We identify and address three inefficiencies in the GenASM algorithm: it has a high amount of data movement, a large memory footprint, and does some unnecessary work. RESULTS: We propose Scrooge, a fast and memory-frugal genomic sequence aligner. Scrooge includes three novel algorithmic improvements which reduce the data movement, memory footprint, and the number of operations in the GenASM algorithm. We provide efficient open-source implementations of the Scrooge algorithm for CPUs and GPUs, which demonstrate the significant benefits of our algorithmic improvements. For long reads, the CPU version of Scrooge achieves a 20.1×, 1.7×, and 2.1× speedup over KSW2, Edlib, and a CPU implementation of GenASM, respectively. The GPU version of Scrooge achieves a 4.0×, 80.4×, 6.8×, 12.6×, and 5.9× speedup over the CPU version of Scrooge, KSW2, Edlib, Darwin-GPU, and a GPU implementation of GenASM, respectively. We estimate an ASIC implementation of Scrooge to use 3.6× less chip area and 2.1× less power than a GenASM ASIC while maintaining the same throughput. Further, we systematically analyze the throughput and accuracy behavior of GenASM and Scrooge under various configurations. As the best configuration of Scrooge depends on the computing platform, we make several observations that can help guide future implementations of Scrooge. AVAILABILITY AND IMPLEMENTATION: https://github.com/CMU-SAFARI/Scrooge.

Assuntos

Algoritmos , Computadores , Genoma , Genômica , Biologia Computacional

From molecules to genomic variations: Accelerating genome analysis via intelligent algorithms and architectures.

Alser, Mohammed; Lindegger, Joel; Firtina, Can; Almadhoun, Nour; Mao, Haiyu; Singh, Gagandeep; Gomez-Luna, Juan; Mutlu, Onur.

Comput Struct Biotechnol J ; 20: 4579-4599, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36090814

RESUMO

We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently exist major computational bottlenecks and inefficiencies throughout the entire genome analysis pipeline, because state-of-the-art genome sequencing technologies are still not able to read a genome in its entirety. We describe the ongoing journey in significantly improving the performance, accuracy, and efficiency of genome analysis using intelligent algorithms and hardware architectures. We explain state-of-the-art algorithmic methods and hardware-based acceleration approaches for each step of the genome analysis pipeline and provide experimental evaluations. Algorithmic approaches exploit the structure of the genome as well as the structure of the underlying hardware. Hardware-based acceleration approaches exploit specialized microarchitectures or various execution paradigms (e.g., processing inside or near memory) along with algorithmic changes, leading to new hardware/software co-designed systems. We conclude with a foreshadowing of future challenges, benefits, and research directions triggered by the development of both very low cost yet highly error prone new sequencing technologies and specialized hardware chips for genomics. We hope that these efforts and the challenges we discuss provide a foundation for future work in making genome analysis more intelligent.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA